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> procedure  and  inter-subtest  branching!  (2)  evaluation  of  the  effects  of 
different  intra-subtest  termination  criteria,  (3)  use  of  classical  regression 
'equations  aqd  regression  equations  corrected  for  errors  of  measurement  in  the 
predictors,  .and  (4)  cross-validation  stability  of  the  inter-subtest  branching 
regression  predictions.  Data  consisted  of  the  responses  from  1,600  students 
to  classroom-administered  final  exams  in  a  general  biology  course  at  the 
University  of  Minnesota. V 

Total  test  length  was  jreduced  from  16%  to  30%  using  the  adaptive  intra¬ 
subtest  item  selection  strategy  with  a  variable  termination  criterion  that 
omits  those  items  providing  little  information  to  the  measurement  process. 
Subtest-length  reduction^  ranged  from  about  8%  to  62%.  Total  test  length 
was  reduced  another  1%  fo  5%  (with  subtest-length  reductions  of  up  to  53%) 
upon  the  addition  of  ah  inter-subtest  branching  strategy  that  utilized 

regression  equations  with  prior  information  concerning  a  student's  performance. 

/ 

Reductions  in  /subtest  length  were  accomplished  with  virtually  no  loss  in 
psychometric  information.  Correlations  between  the  Bayesian  achievement 
estimates  from  the  adaptive  and  conventional  tests  were  uniformly  high, 
typically  r’=.9Q'  and  higher.  Results  showed  that  the  use  of  the  corrected 
regression  equations  did  little  to  improve  the  performance  of  the  inter¬ 
subtest  branching;  although  the  multiple  correlations  for  the  corrected 
equations  were  higher,  both  the  information  curves  and  correlations  of 
achievement  estimates  were  generally  lower.  Cross-validation  results 
indicated  that  the  procedure  can  be  used  in  different  samples  from  the  same 
population. 

Results  from  this  study  generally  supported  the  generality  of  this 
adaptive  testing  strategy  for  reducing  achievement  test  length  with  no 
adverse  impact  on  the  quality  of  the  measurements.  Suggestions  are  made  for 
further  research  with  this  testing  strategy.  , 
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Efficiency  of  an  Adaptive  Inter-Subtest  Branching  Strategy 
In  the  Measurement  of  Classroom  Achievement 


The  development  of  adaptive  testing  technology  has  traditionally  taken 
place  within  the  context  of  ability  measurement.  Indeed,  much  of  the  adaptive 
testing  research  has  been  concerned  with  the  application  of  the  various  adap¬ 
tive  testing  strategies  to  the  measurement  of  a  single  unidimensional  ability 
domain  (e.g.,  Betz  &  Weiss,  1974,  1975;  Larkin  &  Weiss,  1974,  1975;  Lord,  1977; 
McBride  &  Weiss,  1976;  Urry,  1977;  Vale  &  Weiss,  1975;  Weiss,  1973).  More  re¬ 
cently,  Bejar  and  Weiss  (1978);  Bejar,  Weiss,  and  Gialluca  (1977);  Bejar,  Weiss, 
and  Kingsbury  (1977);  and  Kingsbury  and  Weiss  (1979)  have  demonstrated  the  ap¬ 
plicability  of  these  unidimensional  adaptive  testing  strategies  to  the  measure¬ 
ment  of  classroom  achievement.  Frequently,  however,  achievement  tests  include 
items  drawn  from  several  distinct  content  areas.  Hence,  the  assumption  of  uni¬ 
dimensionality  of  the  entire  set  of  items  constituting  an  achievement  test  may 
be  untenable,  and  the  application  of  unidimensional  testing  strategies  inap¬ 
propriate. 

Although  Reckase  (1978)  has  shown  that  the  first  factor  of  a  multidimen¬ 
sional  achievement  test  will  be  related  to  the  item  characteristic  curve  (ICC) 
item  parameter  estimates  from  the  three-parameter  ICC  model,  in  many  cases  the 
first  factor  will  account  for  only  a  small  portion  of  the  common  variance  of 
the  achievement  test  items,  and  even  smaller  portions  of  the  total  variance  of 
the  test.  Thus,  application  of  a  unidimensional  ICC  model  to  a  multidimension¬ 
al  achievement  test  will  result  in  achievement  level  estimates  that  reflect 
achievement  on  only  a  small  subset  of  course  content.  In  addition,  the  diag¬ 
nostic  information  regarding  a  student's  performance  on  specific  course  content 
areas  is  lost  to  both  student  and  instructor  by  measuring  achievement  on  only 
one  dimension. 

In  an  attempt  to  design  an  adaptive  testing  strategy  that  would  reduce 
testing  time,  yet  retain  the  capability  of  providing  students  and  instructors 
with  scores  on  the  separate  subtests  in  an  achievement  domain,  Brown  and  Weiss 
(1977)  proposed  a  testing  strategy  specifically  designed  for  achievement  test 
batteries  that  are  composed  of  multiple  content  areas.  It  included  provisions 
for  adaptive  branching  between  subtests  as  well  as  for  adaptive  item  selection 
within  subtests,  in  an  attempt  to  adapt  the  test  battery  to  each  examinee  most 
efficiently.  Brown  and  Weiss  (1977)  applied  the  combined  inter-subtest  and 
intra-subtest  adaptive  strategy  in  a  real-data  simulation  using  a  military 
achievement  test  battery.  They  observed  a  mean  reduction  in  test  battery  length 
of  nearly  50%,  accompanied  by  a  minimal  loss  in  psychometric  information. 

Purpose 

The  present  study  investigated  the  efficacy  of  this  adaptive  testing  strat¬ 
egy  when  it  was  applied  to  a  classroom  achievement  test  in  a  different  kind  of 
testing  environment.  Further,  this  study  evaluated  the  relative  contributions 
of  the  intra-subtest  item  selection  and  inter-subtest  branching  strategies  in 
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terms  of 

1.  The  number  of  items  administered  in  each  subtest  of  the  battery 
and  in  the  test  as  a  whole, 

2.  Reduction  in  test  length  when  compared  to  the  length  of  a  convention¬ 
ally  administered  examination, 

3.  Correlations  between  achievement  estimates  derived  from  the  adaptive 
strategies  with  those  obtained  from  the  conventional  examination,  and 

4.  Effects  of  adaptive  administration  on  psychometric  information. 

In  addition,  this  study  included  an  investigation  of  the  effects  of  using  the 
adaptive  inter-subtest  branching  strategy  developed  from  one  set  of  data  on  a 
different  data  set,  using  a  double-cross-validation  design. 


METHOD 

Procedure 


Test  Items  and  Subjects 

Real-data  simulation  techniques  were  applied  to  the  item  responses  of  800 
students  who  were  administered  the  final  examination  in  General  Biology,  Biol¬ 
ogy  1-011,  an  introductory  lecture  and  laboratory  class  at  the  University  of 
Minnesota,  during  the  fall  academic  quarter  of  1977,  and  to  the  responses  of 
another  800  biology  students  from  winter  quarter  of  1978. 

Each  of  these  final  examinations  was  110  items  long  and  was  administered 
conventionally  by  paper  and  pencil  at  the  end  of  the  academic  quarter.  However, 
each  student  was  directed  to  answer  only  100  of  the  questions  and  was  free  to 
omit  any  10  items  of  his/her  choice.  Additionally,  only  the  responses  to  those 
items  from  five  content  areas — Chemistry,  Cell,  Energy,  Reproduction,  and  Ecol¬ 
ogy — were  used  for  this  study.  The  numbers  of  items  in  each  content  area  dif¬ 
fered  slightly  across  the  two  quarters;  the  distribution  of  items  across  con¬ 
tent  areas  for  the  two  quarters  is  shown  in  Table  1.  Each  of  these  five  con¬ 
tent  areas  formed  a  subtest  used  for  the  branching  strategy  discussed  below. 

Item  Parameterization 


Items  were  parameterized  within  content  areas  using  Urry's  (1976)  ESTEM 
computer  program  for  latent  trait  item  parameterization  employing  the  three- 
parameter  logistic  model.  This  program  provides  estimates  of  the  ICC  item 
discrimination  (a),  item  difficulty  (b) ,  and  lower  asymptote  ( c )  parameters. 

Urry's  item  parameterization  program  calculates  item  parameter  estimates 
using  a  two-stage  procedure.  In  the  first  stage,  initial  item  parameter  esti¬ 
mates  are  determined  for  all  items.  However,  these  initial  item  parameter 
estimates  are  not  reported  for  an  item  if  one  or  more  of  the  following  condi¬ 
tions  holds:  (1)  a  <  .80,  (2)  b  <-4.00,  (3)  b  >  4.00,  or  (4)  a  >  .30.  In 

the  second  stage,  item  parameters  are  recomputed  for  all  items  that  are  not  ex¬ 
cluded  by  the  criteria  applied  in  the  first  stage.  In  this  stage,  item  parame¬ 
ter  estimates  are  reported  without  restrictions  (e.g.,  c  may  be  greater  than 
.30  for  some  items  in  the  second  stage)  for  all  items  not  excluded  in  the  first 
stage. 

The  items  were  parameterized  at  the  peak  of  training;  that  is,  items  in 
each  content  area  were  parameterized  using  test  data  obtained  soon  after  in- 
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struction  in  that  content  area  took  place.  Items  in  content  areas  Chemistry, 
Cell,  and  Energy  were  parameterized  at  the  time  of  Midquarter  1  (MQ1) ,  and 
items  in  content  areas  Reproduction  and  Ecology  were  parameterized  at  the  time 
of  Midquarter  2  (MQ2) .  Item  parameter  estimates  were  obtained  from  classroom 
examination  data  from  winter  quarter  oZ  1976  through  spring  quarter  1977.  The 
minimum  sample  size  for  parameter  estimation  for  any  one  item  was  844;  most 
item  parameter  estimates  were  based  on  data  from  1,000  to  2,000  students. 

Conventional  Test 


A  conventionally  administered  test  was  used  for  comparison  with  the  adap¬ 
tive  testing  strategies.  The  subtests  were  administered  in  the  same  order  for 
both  the  conventional  and  adaptive  strategies.  In  the  conventional  test  all 
items  within  each  subtest  were  administered  sequentially,  with  all  students  tak¬ 
ing  all  the  items,  and  all  items  were  administered  in  the  same  order.  There 
was,  then,  no  differential  entry  point  for  the  subtests  when  administered  con¬ 
ventionally.  Bayesian  scoring  (Owen,  1975)  was  used  for  each  of  the  convention¬ 
al  subtests,  using  a  mean  of  0.0  and  a  prior  variance  of  1.0  as  the  initial  pri¬ 
or  estimate  of  the  Bayesian  score  for  each  subtest. 

Adaptive  Tests 

As  in  the  Brown  and  Weiss  (1977)  study,  an  adaptive  testing  strategy 
utilizing  both  inter-subtest  adaptive  item  selection  and  intra-subtest  branch¬ 
ing  was  used,  in  conjunction  with  a  variable  termination  criterion.  This  was 
done  in  order  to  reduce  to  a  minimum  the  number  of  items  administered  to  each 
student,  while  causing  minimal  change  in  the  measurement  characteristics  of 
the  whole  test. 

A 

As  in  the  conventional  test,  a  Bayesian  achievement  estimate  (0)  was  ob¬ 
tained  for  each  student  after  the  administration  of  every  item.  Item  selection 
within  each  subtest  was  based  on  the  concept  of  item  information  as  described 
by  Birnbaum  (1968).  Items  were  selected  within  a  subtest  for  each  student  by 
computing  the  value  of  item  information  for  every  unadministered  item  at  the 
current  level  of  8  for  that  student.  The  item  selected  for  administration  was 
the  item  that  had  the  highest  item  information  value  at  that  level  of  8;  once 
an  item  was  administered  to  a  student,  it  was  eliminated  from  the  subtest  pool 
of  available  items  for  that  student.  The  selected  item  was  administered,  the 
student's  response  was  scored,  and  a  new  0  estimate  was  obtained.  Then  a  new 
item  was  selected,  and  the  procedure  was  repeated. 

Testing  continued  within  each  subtest  until  one  of  the  following  conditions 
occurred:  (1)  all  the  items  within  the  subtest  pool  were  administered;  or  (2)  no 
item  remaining  in  the  pool  provided  information  at  the  current  level  of  0  that 
exceeded  some  predetermined  small  amount  of  information.  Two  such  values  of 
information  were  used  in  this  study:  .01  and  .05.  Further  detail  regarding 
item  selection  and  achievement  estimation  can  be  found  in  Brown  and  Weiss  (1977). 

Inter-Subtest  Branching 

Subtest  ordering.  Following  the  proposal  by  Brown  and  Weiss  (1977),  linear 
multiple  regression  was  used  to  determine  the  order  of  administration  of  the  sub¬ 
tests.  Brown  and  Weiss,  however,  ordered  subtests  based  on  the  linear  regres- 


sion  of  number-correct  scores.  In  this  study  a  Bayesian  achievement  estimate, 
using  an  assumed  normal  prior  distribution  with  a  mean  of  0.0  and  a  variance  of 
1.0,  was  calculated  for  each  student  on  each  of  the  five  subtests  of  the  final 
examination.  These  five  scores  were  then  intercorrelated,  and  their  intercor¬ 
relation  matrix  was  used  as  the  basis  for  inter-subtest  branching.  This  pro¬ 
cedure  was  used  for  the  data  from  each  of  the  two  academic  quarters  separately. 

The  highest  bivariate  correlation  was  selected  from  this  intercorrelation 
matrix  (for  each  quarter),  and  one  of  the  two  subtests  was  arbitrarily  desig¬ 
nated  to  be  administered  first;  the  other  was  administered  second.  Multiple 
correlations  were  then  computed  using  these  two  subtests  as  predictor  variables 
and  each  of  the  other  subtests,  in  turn,  as  the  criterion  variable.  The  subtest 
having  the  highest  multiple  correlation  with  the  first  two  subtests  was  desig¬ 
nated  as  the  third  test  to  be  administered.  This  procedure  was  repeated  to  se¬ 
lect  the  fourth  subtest  to  be  administered,  selecting  that  subtest  which  had 
the  highest  multiple  correlation  with  the  previous  three  subtests.  This  process 
was  continued  until  all  five  subtests  were  ordered  and  was  repeated  separately 
for  each  of  the  two  quarters. 

Differential  subbest  entry  points.  After  administration  of  the  first  sub¬ 
test,  each  student's  entry  points  for  the  second  and  subsequent  subtests  were 
differentially  determined.  For  the  first  subtest  each  student's  prior  achieve¬ 
ment  level  was  assumed  to  be  0  =  0.0.  That  is,  it  was  assumed  that  the  student’s 
achievement  level  was  at  the  mean  of  the  estimated  0  distribution,  since  there 
was  no  previous  information  to  indicate  otherwise.  The  initial  item  administered 
from  the  first  subtest  was  that  item  providing  the  most  information  at  0  =  0.0; 
hence,  all  students  began  the  first  subtest  with  the  same  test  item. 

The  entry  point  into  the  item  pool  for  the  second  subtest  was  determined 
from  the  bivariate  regression  of  scores  from  Subtest  2  on  Subtest  1  and  the 
student's  0  at  the  end  of  Subtest  1  (@i).  The  value  of  for  each  student  was 
entered  into  the  bivariate  regression  equation  for  predicting  the  second  subtest 
score  from  the  score  on  the  first  subtest.  This  yielded  an  estimate  for  that 
student's  score  on  Subtest  2,  which  was  then  used  as  the  initial  Bayesian  prior 
0  for  intra-subtest  item  selection  in  Subtest  2.  The  item  that  provided  the 
most  information  at  this  predicted  level  of  0  was  administered  as  the  first  item 
in  the  second  subtest.  The  squared  standard  error  of  estimate  from  the  bivari¬ 
ate  regression  equation  was  used  as  an  estimate  of  the  initial  Bayesian  prior 
variance  of  this  entry-level  achievement  estimate. 

Determination  of  the  entry  point  for  the  third  and  subsequent  subtests  was 
simply  a  generalization  of  the  method  used  for  the  second  subtest.  In  general, 
the  student's  final  achievement  level  estimates  from  all  n  previously  adminis¬ 
tered  subtests  were  entered  into  the  multiple  regression  equation  for  predicting 
the  next  (n  +  1st)  subtest  score  from  scores  on  the  previous  n  subtests.  This 
predicted  achievement  level  estimate  was  used  as  the  initial  Bayesian  prior  0 
for  intra-subtest  branching  within  that  subtest.  The  squared  standard  error 
of  estimate  from  each  regression  was  used  as  the  initial  Bayesian  prior  variance 
for  each  subtest. 

Correated  regression  equations.  In  addition  to  the  classical  multiple  re¬ 
gression  equations,  a  second  set  of  equations  was  used  to  determine  entry-level 
achievement  estimates  for  each  subtest.  This  second  set  of  equations  was  ap¬ 
plied  to  the  data  from  fall  and  winter  final  exams  in  exactly  the  same  manner 
as  described  above;  the  only  difference  between  the  two  procedures  was  in  the 


way  the  equations  were  obtained.  The  results  from  use  of  the  two  kinds  of  re¬ 
gression  equations  were  then  compared. 

The  use  of  the  second  set  of  regression  equations  was  studied  because 
classical  regression  techniques  were  somewhat  inappropriate  for  this  set  of  data. 
In  the  general  linear  model  of  regression,  the  expected  value  of  the  dependent 
variable  y  is  expressed  as  the  "best"  (in  the  least  squares  sense)  weighted  sum 

of  p  independent  variables  x.(i= 1,  . . . ,  p) .  It  is  assumed  that  y  is  randomly 

Z 


distributed  with  n  independent  observations  y  .(j=l,  ...»  n) ,  with  common  vari- 

?  *7 

ance  0 ,  and  that  the  independent  variables  x ^  are  measured  without  error 


(Neter  &  Wasserman,  1974). 


However,  the  original  Bayesian  0  values  used  in  this  regression,  obtained 
for  each  subtest  of  the  final  exam,  were  not  measured  without  error.  Indeed, 
for  each  of  these  Bayesian  estimates,  there  was  a  corresponding  value  for  the 
Bayesian  posterior  variance,  which  can  be  interpreted  ds  an  index  of  the  vari¬ 
ation  inherent  in  the  estimate  itself.  Hence,  any  classical  regression  proce¬ 
dure  using  these  estimates  is  somewhat  in  error. 


Lawley  and  Maxwell  (1973)  and  Maxwell  (1975)  have  discussed  the  effects 
such  errors  have  on  the  regression  equation  and  the  multiple  correlation  co¬ 
efficient.  In  their  discussions,  the  general  linear  equation  is  expressed  as 


y  .  =  a  +  B  (x  .  -  x  )  +  ...  +  B  (  *  •  _  -  3?)  +  e  .  > 

y  J  l  J  i  i  p  OP  P  0 

where 


[1] 


a  is  a  constant; 

B's  are  the  partial  regression  coefficients; 

X.  is  the  mean  of  r..  over  all  ,7;  and 

t  jl  ”  ’ 

e  .  is  the  random  error  of  measurement  in  iv  .. 

0  '  J 

The  estimation  equation,  found  by  the  method  of  least  squares  (where  Z  .e2.  is 

JO 

minimized),  can  be  written  as 

h  '  *  s,(Iy,  -  V  +  •••  +  V1*  -  V- 

where  y.  is  the  mean  of  the  n  observations  of  y  .(j  =  1,  ...,  n)  and  y  .  is  the 

0  j  o 

predicted  value  of  the  dependent  variable  y  .. 

Riven  that  I  is  a  matrix  of  order  n  x  p  of  X  values  (deviation  scores 

x..  -  x .) ,  the  vector  of  regression  weights  is  estimated  by 

JZ  z 

8  =  ( X'X)~ 1  X'Y,  [3 


where  Y  is  a  column  vector  of  elements  y  .  and  X '  is  the  transpose  of  X . 

2  ^  A 

The  error  variance  a  (where  e  .  =  u  .  -  u  .)  is  estimated  by 

e  tl  *  J  *  J 
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«e*  -  Zj  «/  /  (»  '  1),  141 

A 

and  the  estimates  of  the  error  variances  of  the  8s  are  given  by  the  respective 
diagonal  elements  of  the  covariance  matrix 


cov  (8)  -  (X'XJ1  s/. 


[5] 


The  above  equations  assume  that  the  independent  variables  are  measured 
without  error.  To  the  extent  that  this  is  not  true,  the  estimates  of  their 
variances  will  be  inflated.  That  is,  the  diagonal  elements  of  the  matrix  X'X 


will  be  larger  than  they  should  otherwise  be.  In  addition,  since  the  x's  are 
random  variables  chosen  as  plausible  predictors  of  y,  it  is  possible  (even 

probable)  that  the  estimate  of  error  variance  sz  (Equation  4)  will  be  an  over¬ 
estimate  of  the  true  error  variance  of  the  y  .'s. 


The  first  of  these  effects  comes  into  play  when  estimating  the  values  of 
the  regression  coefficients  in  Equation  3.  Because  that  equation  involves  the 
inverse  of  the  matrix  X.'X ,  the  regression  coefficients  are  necessarily  under¬ 


estimated.  Both  of  the  effects  mentioned  above  play  a  part  in  the  estimation 
of  the  covariance  matrix  in  Equation  5.  There  can  never  be  certainty  that  these 
effects  will  cancel  out  each  other.  Maxwell  (1975)  cautions: 


In  summary  we  see  that  inadequate  specification  of  y  and  errors  of 
measurement  in  the  aj's  lead  to  a  situation  in  which  the  tests  of  sig¬ 
nificance  provided  for  the  classic  model  are  of  dubious  validity  in 
most  social  science  applications.  At  best  we  can  claim  that,  if  e. 

are  calculated  and  found  to  be  approximately  normally  distributed,  a 
significant  multiple  correlation  coefficient  would  indicate  some  de¬ 
pendence  of  y  on  a  weighted  sum  of  the  x's.  But  the  relative  sizes 
of  the  regression  weights  would  be  suspect  and  the  magnitude  of  the 
multiple  correlation  coefficient  in  particular  would  be  the  point  to 
note.  (pp.  52-53) 


Both  Lawley  and  Maxwell  (1973)  and  Maxwell  (1975)  show  how  such  errors  of 
measurement  in  the  x' s  can  be  handled  by  stating  the  model  in  factor  analytic 
terms  and  proceeding  from  there.  Essentially,  the  set  of  predictor  variables 
is  reduced  to  a  "best"  set  of  statistically  independent  variables  (i.e.,  the 
factors),  and  then  the  dependent  variable  is  predicted  from  these.  Specifically, 
the  analysis  proceeded  as  follows: 


The  maximum  likelihood  estimate  of  the  correlation  matrix  is  given  by 
Z*  =  A*  A*"  +  V*,  [6] 


where 


Z*  (of  order  1  +  p)  includes  the  dependent  variable  y  together  with  the 
p  independent  variables, 
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A*  is  a  (1  +  p)  x  k  matrix  of  factor  loadings  of  all  the  variables  on 
the  k  factors,  and 

is  a  diagonal  matrix  of  residual  variances. 

Partitioning  A*  as 


where  X'  contains  the  loadings  of  y  on  the  factors  and  A  contains  the  corres¬ 
ponding  loadings  of  the  x's,  yields  the  regression  equation 

y  =  k'f  •  [8] 

Estimating  the  factors  f  in  this  equation  (see  Maxwell,  1975,  p.  59)  yields 
the  new  regression  equation 

y  =  X'  T~l  A'  'T1  x  ,  [9] 

where  F  =  A'  V-1  A  is  a  diagonal  matrix.  In  this  approach,  the  square  of  the 

multiple  correlation  coefficient  for  the  y's  predicted  from  the  x's  is  given  by 
the  communality  of  y  in  the  maximum  likelihood  factor  analysis. 

For  this  study,  maximum  likelihood  factor  analyses  were  performed  separate¬ 
ly  on  the  3  x  3,  4  x  4,  and  5  x  5  E*  matrices  corresponding  to  the  2,  3,  and 
4  independent  variable  cases,  respectively  (the  dependent  variable  y  is  al¬ 
ways  included  in  the  E*  matrix).  The  matrices  from  a  one-factor  solution  were 
obtained  in  each  case  and  Equation  9  was  calculated  for  predicting  scores  on 
Subtests  3,  4,  and  5,  respectively,  from  the  scores  on  all  previously  adminis¬ 
tered  subtests. 

To  examine  the  effect  of  using  the  corrected  (versus  the  classical)  regres¬ 
sion  equations,  the  subtests  were  administered  in  the  same  order  for  inter¬ 
subtest  branching  as  they  were  for  the  classical  equations.  Since  factor  anal¬ 
yses  cannot  be  performed  when  the  number  of  variables  is  less  than  three,  the 
classical  regression  equations  were  used  for  the  prediction  of  Subtest  2  scores. 

Since  the  square  of  the  multiple  correlation  coefficient  (/?)  was  given  by 
the  communality  of  y  in  these  analyses,  the  standard  error  of  estimate  (iJETT)  was 
computed  using  the  formula 


SEE  =  s 


'1  -  B* 


Cross-validation.  Since  this  study  was  a  real-data  simulation  of  various 
testing  strategies,  the  regression  equations  developed  from  students'  subtest 
scores  during  any  one  academic  quarter  were  used  in  the  inter-subtest  branching 
strategy  simulated  from  students'  item  responses  from  that  same  quarter.  As 
with  any  application  of  multiple  regression  techniques,  the  estimates  of  the 


/ 
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fc-weights  and  the  multiple  correlation  coefficient  were  likely  to  be  inflated 
due  to  sample-specificity.  To  the  extent  that  this  was  true,  the  inter-subtest 
branching  strategy  would  be  nonoptimal  for  any  subsequent  sample  of  students. 

To  investigate  the  extent  to  which  variance  in  the  multiple  correlation 
coefficients  and  the  fc-weights  affected  the  efficacy  of  the  inter-subtest  branch¬ 
ing  strategy  employed  here,  a  double-cross-validation  design  was  used.  Both 
the  fall  and  winter  quarter  samples  served  as  independent  development  groups, 
and  both  sets  of  regression  equations  (classical  and  corrected)  were  obtained 
separately  for  each  group.  Then,  the  equations  developed  from  the  fall  data 
were  used  in  the  simulation  with  the  data  from  both  the  fall  and  winter  quarters 
and  correspondingly  for  the  equations  developed  from  the  winter  data.  The  re¬ 
sults  obtained  in  this  way  allowed  for  a  direct  investigation  of  the  extent  to 
which  the  efficacy  of  the  adaptive  strategies  was  affected  by  cross-sample  dis¬ 
crepancies  in  the  regression  equations. 

Adaptive  Intra-Subtest  Item  Selection 

Brown  and  Weiss  (1977)  compared  the  results  obtained  from  the  entire  test¬ 
ing  strategy  combining  both  intra-subtest  item  selection  and  inter-subtest 
branching  with  those  obtained  when  the  tests  were  conventionally  administered. 

In  this  study  the  effects  of  the  variable  termination  criterion  in  the  intra¬ 
subtest  item  selection  strategy  were  separated  from  those  of  the  inter-subtest 
branching  strategy,  and  the  relative  contributions  of  these  aspects  of  the  adap¬ 
tive  strategy  were  determined. 

Consequently,  a  third  set  of  testing  conditions  was  simulated.  Here,  the 
five  subtests  were  treated  as  independent  sets  of  items.  Instead  of  branching 
from  one  subtest  to  the  next  using  the  regression-based  inter-subtest  branching 
strategy,  each  subtest  was  considered  to  be  a  self-contained  test.  As  in  the 
conventional  test,  Bayesian  scoring  was  used;  and  a  mean  of  0.0  with  a  variance 
of  1.0  was  used  as  the  initial  prior  0  for  each  of  the  five  subtests.  Items 
within  each  subtest,  however,  were  selected  according  to  the  intra-subtest  item 
selection  scheme  described  above,  and  the  variable  termination  information  cri¬ 
terion  values  of  .01  and  .05  were  used.  Hence,  the  only  difference  between 
these  tests  and  the  other  sets  of  adaptive  tests  was  that  inter-subtest  branch¬ 
ing  was  not  utilized  here. 

Dependent  Variables 

The  important  question  in  this  study  was  not  "Can  test  length  be  reduced 
by  adaptive  testing?"  but  rather  "Can  test  length  be  reduced  and  adequate  levels 
of  measurement  precision  be  maintained?"  It  would  be  pointless  to  reduce  test 
length  by  20%,  30%, or  more  if  much  of  the  measurement  accuracy  was  sacrificed 
in  the  process. 

Correlations  of  Achievement  Level  Estimates 

One  means  of  investigating  the  extent  to  which  measurement  precision  was 
preserved  or  lost  by  the  adaptive  testing  strategy  is  correlational  analysis; 
that  is,  how  well  did  the  achievement  estimates  on  the  adaptive  tests  correlate 
with  those  on  the  conventional  tests?  For  this  study  these  correlations  were 
obtained  for  each  of  the  subtests  across  all  testing  conditions. 
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Information 

The  degree  to  which  measurement  precision  is  lost  through  test-length  re¬ 
duction  may  also  be  assessed  by  Inspection  of  the  relevant  subtest  information 
curves.  The  adaptive  subtest  information  curves  were  obtained  as  follows: 

A  student's  final  0  was  obtained  for  any  one  subtest  after  testing  termi¬ 
nated  for  that  subtest.  Then,  the  item  information  function  (Blrnbaum,  1968) 
was  evaluated  at  that  student's  final  6  for  each  item  that  was  administered 
adaptively.  These  item  information  values  were  then  summed  across  all  items 
administered  to  the  student  in  that  subtest  in  order  to  obtain  the  adaptive 
subtest  information  curve  for  that  student. 

The  conventional  subtest  information  curves  were  obtained  in  essentially 
the  same  way,  except  that  the  item  information  functions  were  evaluated  at  the 
0  arising  from  administration  of  the  conventional  subtest,  and  they  were  summed 
over  all  the  items  in  the  subtest  pool. 

\ 

When  a  final  0  had  been  obtained  for  every  student,  the  students  were 
grouped  into  20  nonoverlapping  intervals  on  the  basis  of  their  0  values  from 
either  the  conventional  or  adaptive  test.  The  mean  subtest  information  value 
(over  all  students  within  an  interval)  was  obtained  for  each  of  the  20  intervals 
separately  for  the  conventional  and  adaptive  tests;  these  mean  values  were  then 
plotted  at  the  midpoint  of  each  interval  in  order  to  obtain  the  subtest  infor¬ 
mation  curves. 


RESULTS 

Preliminary  Results 


Item  Parameters 


Table  1  presents  the  means  and  standard  deviations  for  estimates  of  the 
latent  trait  item  parameters  a,  b ,  and  a.  Also  included  are  the  number  and 
percentage  of  items  from  the  final  exams  for  which  parameter  estimates  could 
be  obtained.  Individual  item  parameter  estimates,  by  subtest,  are  shown  in 
Appendix  Tables  A  and  B  for  the  fall  and  winter  data,  respectively. 

Table  1  shows  that  item  parameters  were  obtained  for  94%  (or  46)  of  the 
49  items  available  on  the  fall  quarter  final  exam.  This  retention  rate  ranged 
from  85%  of  the  items  in  the  Chemistry  subtest  to  100%  of  the  items  in  the  Cell, 
Energy,  and  Reproduction  subtests.  The  winter  quarter  final  exam  exhibited  a 
somewhat  lower  retention  rate,  with  84%  (or  31)  of  the  37  available  items  yield¬ 
ing  parameter  estimates.  The  Ecology  subtest  suffered  the  largest  loss  (75% 
retention).,  although  closer  inspection  revealed  that  this  was  a  loss  of  only 
1  of  the  4  original  items;  no  subtest  lost  more  than  2  items.  In  terms  of  ab¬ 
solute  numbers  of  items,  the  winter  quarter  item  pool  was  somewhat  smaller 
than  that  from  fall  quarter:  31  parameterized  items  compared  to  46. 

The  overall  mean  b  parameter  for  the  fall  quarteritem  pool  (-.22)  was 
slightly  lower  than  that  for  the  winter  quarter  pool,  b  =.02.  The  mean  a 
parameters  of  1.80  and  1.81  and  a  parameter  of  .40  were  essentially  identical 
for  the  two  pools. 
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Table  1 

Means  and  Standard  Deviations  of  Normal  Ogive  Item  Discrimination  (a) , 
Difficulty  (fc),  and  Lower  Asymptote  (c)  Parameter  Estimates  for  the 
_ Fall  and  Winter  Quarter  Final  Exams  by  Subtest _ _ 


Quarter  and 
Sub test 


Number  of  Items 
Avail-  Parame- 
able  terized 


Percent 
of  Items 

Parame-  a  _ b  a 

terized  Mean  SD  Mean  SD  Mean  SD 


Fall 


Chemistry 

13 

11 

85 

1.56 

.44 

-.49 

.78 

.32 

.09 

Cell 

9 

9 

100 

1.84 

.41 

.23 

1.34 

.45 

.09 

Energy 

9 

9 

100 

2.27 

.47 

-.05 

1.02 

.42 

.13 

Reproduction 

11 

11 

100 

1.64 

.57 

-.13 

.92 

.40 

.14 

Ecology 

7 

6 

86 

1.73 

.36 

-.80 

.67 

.44 

.07 

Total 

49 

46 

94 

1.80 

.51 

-.22 

.99 

.40 

.12 

Winter 

Chemistry 

10 

8 

80 

1.77 

.37 

-.29 

.82 

.29 

.07 

Cell 

6 

6 

100 

1.69 

.26 

-.09 

1.06 

.38 

.07 

Energy 

8 

7 

88 

2.22 

.49 

.21 

.79 

.45 

.14 

Reproduction 

9 

7 

78 

1.53 

.32 

.25 

1.22 

.47 

.11 

Ecology 

4 

3 

75 

1.81 

.54 

.08 

1.64 

.51 

.24 

Total 

37 

31 

84 

1.81 

.44 

.02 

1.00 

.40 

.14 

Ordering  of  Subtests 

The  intercorrelations  of  Bayesian  ability  estimates  from  the  five  subtests 
in  each  quarter  are  shown  in  Table  2.  For  the  data  from  fall  quarter,  these 
inter-subtest  correlations  ranged  from  .289  (between  Ecology  and  Energy)  to 
.433  (between  Cell  and  Chemistry).  The  range  of  correlations  was  somewhat  larg¬ 
er  for  the  winter  quarter  data;  the  lowest  correlation  was  .160  (between  Cell 
and  Ecology)  and  the  largest  correlation  was  .496  (between  Chemistry  and  Energy). 

Since  the  highest  correlation  was  between  Chemistry  and  Cell  in  the  fall 
data  and  between  Chemistry  and  Energy  in  the  winter  data,  the  Chemistry  subtest 
was  designated  to  be  administered  first  in  each  case;  the  Cell  subtest  was 
administered  second  for  the  fall  quarter  equations  and  the  Energy  subtest  was 
administered  second  for  the  winter  quarter  equations. 


Table  2 


Intercorrelations  of  Bayesian  Ability  Estimates 
on  the  Five  Subtests  of  the  Fall  (Below  Diagonal) 
and  Winter  (Above  Diagonal)  Quarter  Final  Exams 


Subtest 

Subtest 

Chemistry 

Cell 

Energy 

Reproduction 

Ecology 

Chemistry 

.451 

.496 

.379 

.228 

Cell 

.433 

.456 

.301 

.160 

Energy 

.412 

.370 

.347 

.189 

Reproduction 

.388 

.344 

.321 

.221 

Ecology 

.387 

.302 

.289 

. 302~~~ — 

- ___ 
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For  the  fall  quarter  data,  multiple  regression  equations  were  obtained  using 
the  Chemistry  and  Cell  subtests  as  independent  variables  and  each  of  the  other 
subtests,  in  turn,  as  the  dependent  variable.  Because  the  Energy  subtest  had 
the  highest  multiple  correlation  with  these  first  two  subtests,  it  was  chosen 
as  the  third  subtest  to  be  administered.  This  procedure  was  repeated  to  select 
the  fourth  and  fifth  subtests  for  administration.  The  same  process  was  carried 
out  using  the  winter  quarter  data. 

Appendix  Table  C  shows  the  intermediate  classical  regression  equations 
used  to  choose  the  order  of  administration  of  the  subtests  for  both  fall  and 
winter  quarters.  For  the  fall  equations  the  subtests  were  ordered  in  the  fol¬ 
lowing  sequence:  Chemistry,  Cell,  Energy,  Reproduction,  and  Ecology.  For  the 
winter  equations  the  order  was  Chemistry,  Energy,  Cell,  Reproduction,  and 
Ecology. 

Table  3  shows  the  classical  (or  uncorrected)  regression  coefficients,  mul¬ 
tiple  correlation  coefficients,  and  standard  errors  of  estimate  for  the  sets 
of  regression  equations  from  both  the  fall  and  winter  data.  These  equations 
were  those  used  for  inter-subtest  branching. 


Table  3 

Regression  Coefficients,  Multiple  Correlation  Coefficients  (7?) ,  and 
Standard  Errors  of  Estimate  (SEE)  for  the  Classical  Regression  Equations 
from  the  Fall  and  Winter  Quarter  Final  Exams 


Regression  Coefficients  for  Scores 


Quarter  and 

on  Previously  Administered 

Subtests 

Regres¬ 

Criterion 

Repro¬ 

sion 

Subtest 

Chemistry 

Cell 

Energy 

duction 

Constant 

R 

SEE 

Fall 

Cell 

.400 

.137 

.433 

.680 

Energy 

.328 

.272 

-.009 

.464 

.768 

Reproduction 

.240 

.190 

.140 

.204 

.455 

.707 

Ecology 

Winter 

.221 

.110 

.089 

.128 

-.029 

.446 

.665 

Energy 

.461 

.056 

.496 

.637 

Cell 

.276 

.305 

-.144 

.525 

.620 

Reproduction 

.258 

.129 

.203 

.134 

.432 

.761 

Ecology 

.102 

.026 

.052 

.103 

.112 

.278 

.595 

Corrected  Equations 

The  corrected  regression  coefficients,  multiple  correlation  coefficients, 
and  standard  errors  of  estimate  from  the  fall  and  winter  final  exams  are  given 
in  Table  4.  The  factor  loadings  and  estimates  of  communalities  used  to  compute 
these  equations  are  given  in  Appendix  Table  D.  It  should  be  noted  that  the 
factor  analytic  techniques  could  not  be  applied,  of  course,  unless  there  were 
at  least  three  variables  in  the  regression  equation.  Hence,  for  the  cases  in 
which  there  were  only  two  variables,  e.g.,  one  predictor  subtest  and  one  cri¬ 
terion  subtest,  the  classical  (or  uncorrected)  regression  equation  was  used. 
Therefore, the  first  and  fifth  lines  in  Table  4  match  exactly  the  first  and 
fifth  lines,  respect ively, of  Table  3. 
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Table  4 

Regression  Coefficients,  Multiple  Correlation  Coefficients  (R) ,  and 
Standard  Errors  of  Estimate  (SEE)  for  the  Corrected  Regression  Equations 
_ from  the  Fall  and  Winter  Quarter  Final  Exams _ 

Regression  Coefficients  for  Scores 
Quarter  and  on  Previously  Administered  Subtests  Regres- 

Crlterlon  Repro-  sion 


Subtest 

Chemistry 

Cell 

mmrm 

Constant 

R 

SEE 

Fall 

Cell 

.400 

.137 

.433 

.680 

Energy 

.538 

.446 

-.008 

.594 

.698 

Reproduction 

.345 

.279 

.216 

.206 

.552 

.662 

Ecology 

Winter 

.266 

.195 

.152 

.152 

-.024 

.523 

.633 

Energy 

.461 

.056 

.496 

.637 

Cell 

.416 

.461 

-.132 

.644 

.557 

Reproduction 

.296 

.230 

.295 

.153 

.504 

.729 

Ecology 

.119 

.088 

.113 

.051 

.127 

.303 

.590 

Comparison  of  the  entries  in  Table  3  with  those  in  Table  4  reveals  that 
the  Lawley-Maxwell  method  of  correction  for  multiple  regression  equations  did 
indeed  increase  the  sizes  of  both  the  multiple  correlation  coefficient  and  the 
regression  coefficients.  Inspection  of  the  fall  quarter  data,  for  example, 
shows  that  the  corrected  multiple  correlation  coefficients  increased  from 
R  =  .464,  .455,  and  .446  to  R  =  .594,  .552,  and  .523,  respectively;  there  were 
corresponding  decreases  in  the  sizes  of  the  standard  errors  of  estimate.  The 
£>-weights  also  increased  in  size,  with  the  largest  increases  occurring  in  those 
equations  with  the  fewest  independent  variables.  For  example,  when  the  Energy 
subtest  was  the  criterion,  the  regresssion  coefficients  for  the  Chemistry  and 
Cell  subtests  increased  from  b  =  .328  and  .272  to  b  =  .538  and  .446,  respectively. 

A  similar  effect  was  observed  with  the  winter  quarter  data.  Here,  the  cor¬ 
rected  multiple  correlation  coefficients  increased  from  R  =  .525,  .432,  and  .278 
to  R  =  .644,  .504,  and  .303,  respectively;  again,  there  were  corresponding  de¬ 
creases  in  the  sizes  of  the  standard  errors  of  estimate.  All  but  one  of  the 
i-weights  increased  in  size;  the  £>-weight  for  the  Reproduction  subtest  in  the 
final  equation  decreased  from  .103  to  .051. 

Test  Length 


Mean  Test  Length 

Table  5  presents  the  mean  numbers  of  items  administered  in  each  of  the 
five  subtests  and  in  the  total  test  for  the  conventional  test  and  for  the  adap¬ 
tive  test  using  adaptive  intra-subtest  item  selection  but  no  inter-subtest 
branching. 

Conventional  test.  During  the  actual  final  exam  in  each  quarter,  students 
were  free  to  omit  any  10  (of  110)  items  of  their  choice.  To  the  extent  that 
students  omitted  some  of  the  items  with  ICC  parameters  that  were  selected  for 
inclusion  in  these  simulation  item  pools  (i.e.,  from  the  five  content  areas — 
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Chemistry,  Cell,  Energy,  Reproduction,  and  Ecology),  the  number  of  items  for 
which  student  responses  were  available  varied  across  students.  Thus,  in  these 
five  content  areas,  students  answered  from  37  to  46  of  the  parameterized  items 
in  fall  and  23  to  31  items  in  winter.  Consequently,  the  conventionally  admin¬ 
istered  test  was, on  the  average,  43  items  long  for  the  fall  quarter  data  and 
28.55  items  long  for  the  winter  data. 


Table  5 

Number  of  Items  Administered  in  the  Five  Subtests  of  the  Fall  and 
Winter  Quarter  Final  Exams  with  No  Inter-Subtest  Branching 


Adaptive  Intra-Subtest  Item  Selection 
Termination  Criterion 

• 

• 

Conventional  Test 

.01 

.05 

Subtest 

Range 

Range 

Range 

and  Data 

Mean 

SD 

Min 

Max 

Mean 

SD 

Min 

Max 

Mean 

SD 

Min 

Max 

Chemistry 

Fall 

10.21 

.91 

6 

11 

9.13 

1.41 

5 

11 

8.09 

1.59 

4 

11 

Winter 

7.48 

.72 

4 

8 

6.59 

1.16 

3 

8 

5.85 

1.16 

2 

8 

Cell 

Fall 

8.50 

.71 

5 

9 

6.93 

.89 

3 

8 

5.68 

1.10 

3 

7 

Winter 

5.64 

.60 

3 

6 

4.73 

.85 

2 

6 

4.26 

.71 

2 

5 

Energy 

Fall 

8.09 

.95 

4 

9 

5.96 

1.03 

3 

9 

5.15 

.88 

2 

8 

Winter 

5.91 

1.01 

2 

7 

4.67 

.95 

2 

7 

4.30 

1.03 

2 

7 

Reproduction 

Fall 

10.46 

.84 

7 

11 

8.78 

1.08 

4 

11 

7.67 

1.33 

4 

10 

Winter 

6.69 

.56 

3 

7 

4.93 

1.09 

1 

7 

4.04 

.80 

1 

5 

Ecology 

Fall 

5.73 

.50 

3 

6 

5.24 

.74 

2 

6 

4.07 

1.20 

2 

6 

Winter 

2.82 

.38 

2 

3 

1.95 

.21 

1 

2 

1.07 

.26 

1 

2 

Total  Test 

Fall 

43.00 

1.77 

37 

46 

36.04 

2.46 

28 

42 

30.67 

3.17 

22 

41 

Winter 

28.55 

1.60 

23 

31 

22.87 

2.47 

14 

29 

19.52 

2.12 

12 

26 

The  discrepancy  between  the  two  quarters  in  the  numbers  of  items  available 
in  the  conventional  test  lor  this  study  was  fairly  evenly  distributed  across 
all  five  subtests,  so  that  the  relative  size  of  each  subtest  remained  about  the 
same  (see  Table  1).  That  is,  Chemistry  and  Reproduction  were  the  longest  sub¬ 
tests,  and  Ecology  was  consistently  the  shortest. 

Adaptive  intra-subteat  item  selection.  In  these  sets  of  tests,  the  intra¬ 
subtest  item  selection  strategy  was  employed  with  a  variable  termination  cri¬ 
terion,  but  no  irter-subtest  branching  scheme  was  used.  That  is,  a  prior  9  of 
0.0  with  an  estimated  variance  of  1.0  was  used  as  an  entry  point  in  each  of 
the  five  subtests.  Table  5  shows  data  on  test  lengths  obtained  for  each  sub¬ 
test  under  the  two  termination  criteria  used  in  this  study  (item  information 
of  .01  and  .05).  During  the  fall  quarter  the  length  of  the  total  test  battery 
averaged  36.04  items  under  the  more  stringent  termination  criterion,  .01,  and 
30.67  items  under  the  termination  criterion  of  .05.  For  winter  quarter  these 
figures  were  22.87  and  19.52,  respectively. 
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In  all  cases  the  maximum  number  of  items  administered  under  this  adaptive 
strategy  represented  some  reduction  in  total  test  battery  length.  For  the  fall 
data  no  student  answered  more  than  42  items  under  the  .01  termination  criterion 
and  the  shortest  adaptive  test  was  only  28  items  long.  For  the  .05  criterion 
the  longest  test  was  41  items;  the  shortest  was  22.  For  the  winter  quarter  data 
these  figures  were  29  and  14  for  the  .01  termination  criterion  and  26  and  12 
for  the  .05  criterion. 

Inter-subtest  branching.  When  the  inter-subtest  branching  strategy  was 
employed  in  addition  to  the  adaptive  lntra-subtest  item  selection  strategy  and 
variable  termination  criterion,  test  length  was  reduced  even  further.  Tables 
6  and  7  show  the  mean  test  lengths  under  these  conditions,  when  both  the  class¬ 
ical  and  corrected  regression  equations  were  developed  on  the  data  from  the 
fall  and  winter  quarters,  respectively.  Data  for  the  Chemistry  subtest  (the 
|lrst  subtest  administered)  are  the  same  In  the  two  tables  because  the  initial 
6  was  assumed  to  be  0.0  with  a  variance  of  1.0  for  all  students  and  was  constant 
for  the  first  subtest,  regardless  of  branching  strategy  used  (e.g.,  no  branching 
versus  inter-subtest  branching) . 

For  both  the  .01  and  .05  termination  criterion,  the  addition  of  the  inter¬ 
subtest  branching  strategy  generally  resulted  in  shorter  tests;  the  exception 
was  the  Ecology  subtest  with  a  .05  termination  criterion  under  all  testing  con¬ 
ditions.  However,  in  comparison  to  the  results  from  use  of  intra-subtest  branch 
ing  only  (see  Table  5),  this  reduction  was  slight — never  more  than  one  item  for 
the  total  test.  The  data  also  show  that  the  branching  strategy  utilizing  the 
corrected  regression  equations  resulted  in  tests  that  were  shorter  than  when 
the  classical  regression  equations  were  used,  although  the  difference  was  very 
slight.  For  example,  under  the  .01  termination  criterion,  the  classical  fall 
quarter  regression  equations  resulted  in  a  total  test  battery  length  of  35.61 
items  for  the  fall  data  and  35.15  items  when  the  corrected  regression  equations 
were  used  (Table  6).  When  the  .05  termination  criterion  was  used,  the  classi¬ 
cal  fall  quarter  equations  resulted  in  a  mean  test  battery  length  of  30.33 
items  versus  30.10  items  for  the  corrected  equations.  There  was  a  tendency  for 
the  corrected  equations  to  result  in  higher  standard  deviations  of  numbers  of 
items  administered  in  the  total  test  than  did  the  classical  equations;  this 
was  due  to  the  tendency  toward  shorter  minimum  total  test  lengths.  Similar  re¬ 
sults  were  observed  when  the  winter  quarter  equations  were  used  (see  Table  7). 

Cross-Validation.  There  was  very  little  difference  between  total  test 
lengths  in  the  development  groups  and  in  cross-validation;  the  differences 
which  were  found  were  usually  in  the  direction  of  shorter  tests  when  the  re¬ 
gression  equations  were  cross-validated  on  data  from  the  other  quarter.  For 
example,  when  the  classical  regression  equations  developed  on  winter  quarter 
data  were  applied  to  that  same  data,  mean  test  length  was  22.64  and  19.90  for 
termination  criteria  of  .01  and  .05,  respectively  (see  Table  7).  When  the 
cross-validated  classical  fall  quarter  equations  were  applied  to  that  winter 
data  (Table  6),  however,  the  means  were  22.58  and  19.68,  respectively.  The  re¬ 
sults  for  the  classical  regression  equations  applied  to  the  fall  quarter  data 
were  mixed.  When  the  results  from  the  sets  of  corrected  equations  were  com¬ 
pared,  they  favored  the  cross-validated  condition  whenever  a  difference  was 
found. 
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Note.  The  results  from  the  winter  data  are  presented  before  those  from  fall  in  this  table  because  the 
winter  data  represent  the  development  group,  and  the  fall  data  the  cross-validation  group. 
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Peroent  Reduction  in  Test  Length 

Table  8  summarizes  the  percent  reduction  in  the  mean  number  of  items  ad¬ 
ministered  in  each  subtest  and  in  the  total  test  under  the  various  testing  con¬ 
ditions. 

Adaptive  intra- subtest  item  selection.  The  first  column  of  data  in 
Table  8  represents  the  reduction  in  mean  test  length  that  was  observed  when 
only  the  adaptive  intra-subtest  item  selection  strategy  with  a  variable  termi¬ 
nation  criterion  was  compared  to  a  conventionally  administered  test.  In  both 
these  adaptive  and  conventional  tests,  each  subtest  was  treated  as  a  separate 
unit  with  no  inter-subtest  branching  between  tests.  For  the  fall  quarter  data, 
use  of  the  adaptive  testing  strategy  decreased  total  test  length  by  16.19%  under 
the  .01  termination  criterion  and  decreased  it  by  as  much  as  28.67%  when  the 
.05  criterion  was  used.  When  this  strategy  was  used  on  the  winter  quarter  data, 
the  respective  reductions  were  19.89%  and  31.63%  in  total  test  length. 

The  largest  reduction  in  subtest  length  using  a  termination  criterion  of  .01 
occurred  for  the  fifth  subtest,  Ecology,  and  amounted  to  a  total  decrease  of 
almost  31%  of  the  items.  This  effect,  however,  was  limited  to  the  winter  data, 
as  the  Ecology  subtest  for  the  fall  data  exhibited  a  reduction  of  less  than  9%. 

On  the  average,  the  Chemistry  subtest  (the  first  subtest  administered)  showed 
the  smallest  decrease  in  number  of  items  administered — about  10  to  12%.  The 
same  pattern  was  observed  among  the  subtests  when  a  termination  criterion  of 
.05  was  used.  That  is,  the  largest  reduction  in  subtest  length  was  observed 
for  the  Ecology  subtest  for  the  winter  data  (62.06%);  and  the  smallest  reduction, 
on  the  Chemistry  subtest  for  the  fall  data  (20.76%). 

Inter-subtest  branching.  The  remaining  columns  of  Table  8  show  the  re¬ 
sults  obtained  when  the  inter-subtest  branching  scheme  was  coupled  with  the 
adaptive  intra-subtest  item  selection  strategy  and  then  compared  to  a  conven¬ 
tionally  administered  test.  The  reductions  in  total  test  length  were  slightly 
greater  than  those  obtained  when  the  inter-subtest  branching  strategy  was  not 
utilized. 

For  example,  when  the  fall  quarter  equations  were  applied  to  the  fall 
quarter  data,  the  reduction  in  average  test  length  for  the  total  test  increased 
from  16.19%  to  17.19%  for  the  classical  equations  and  18.26%  for  the  corrected 
equations  under  the  .01  termination  criterion.  These  figures  were  28.67%, 

29. 47%, and  30.00%,  respectively,  for  the  .05  termination  criterion.  Use  of  the 
corrected  regression  equations  generally  resulted  in  somewhat  shorter  total 
test  lengths  than  did  use  of  the  classical  equations,  although  the  difference 
was  slight. 

When  the  winter  quarter  equations  were  applied  to  the  winter  quarter  data, 
total  test  length  was  reduced  from  19.89%  to  20.70%  for  the  classical  equations 
and  21.40%  for  the  corrected  equations  under  the  .01  termination  criterion. 

These  figures  were  31.63%,  30.30%,  and  32.05%,  respectively,  for  the  .05  termi¬ 
nation  criterion.  Use  of  the  classical  equations  actually  resulted  in  tests 
which  were  slightly  longer  under  the  .05  criterion  than  when  no  inter-subtest 
branching  strategy  was  used.  Use  of  the  corrected  equations,  however,  resulted 
in  shorter  tests,  as  expected. 

In  general  (across  both  sets  of  data),  additional  reduction  in  test  length 
was  less  than  three  percentage  points,  and  most  often  one  percentage  point  or 
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Computed  by  the  formula:  100-[(Mean  number  of  items  in  appropriate  adaptive  test/mean  number 
of  items  in  conventional  test)  *  100]. 
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less.  Use  of  the  corrected  equations  resulted  in  shorter  tests  in  all  cases 
in  comparison  with  use  of  adaptive  intra-subtest  item  selection  alone.  The 
Energy  subtest  showed  the  largest  decreases  in  test  length  across  testing  con¬ 
ditions  (with  the  exception  of  the  Ecology  subtest  administered  during  winter 
quarter,  which  showed  the  greatest  reduction  in  test  length).  This  was  followed 
closely  by  the  Cell,  Reproduction,  and  Chemistry  subtests,  respectively.  Dur¬ 
ing  fall  quarter  the  decrease  in  the  length  of  the  Ecology  subtest  was  the 
smallest. 

Cross-validation.  When  the  fall  quarter  equations  were  applied  to  the 
data  from  winter  quarter  in  the  cross-validation  condition,  test-length  reduc¬ 
tion  increased  from  19.89%  with  no  inter-subtest  branching  to  20.91%  for  the 
classical  equations  and  22.10%  for  the  corrected  equations,  under  the  .01  term¬ 
ination  criterion.  For  the  termination  criterion  of  .05, these  figures  were 
31.63%  with  no  inter-subtest  branching  and  31.07%  and  32.05%  for  the  two  inter¬ 
subtest  branching  conditions  with  .01  and  .05  termination,  respectively.  With 
the  winter  data  there  was  a  slight  increase  in  test  length  on  cross-validation 
from  28.67%  without  inter-subtest  branching  to  30.30%  for  the  classical  equa¬ 
tions  and  .05  termination  criterion. 

For  the  double-cross-validation  condition,  when  the  winter  quarter  equa¬ 
tions  were  applied  to  the  fall  quarter  data,  reductions  in  test  length  were 
again  observed.  For  the  .01  termination  criterion,  test  length  decreased  from 
16.19%  without  inter-subtest  branching  to  17.30%  for  the  classical  equations 
and  18.28%  for  the  corrected  equations.  These  figures  were  28.67%,  28.53%, 
and  30.02%,  respectively,  for  the  .05  termination  criterion.  (Only  with  the 
.01  termination  criterion  were  the  tests  with  the  cross-validated  equations  con¬ 
sistently  shorter  than  the  tests  with  the  original  (development  group)  equations. 
At  the  .05  termination  level  the  results  from  the  classical  and  corrected  equa¬ 
tions  were  mixed. 

In  summary,  for  the  .01  termination  criterion  the  reduction  in  total  test 
length  for  the  data  from  each  of  the  quarters  was  nearly  always  greater  when 
the  regression  equations  were  cross-validated.  The  results  from  using  the  .05 
criterion  were  mixed.  As  was  observed  with  the  two  development  groups,  use  of 
the  corrected  equations  resulted  in  shorter  mean  test  lengths  under  cross- 
validation  than  did  use  of  the  cross-validated  classical  equations.  In  all 
cases,  however,  observed  differences  in  test  length  reduction  were  slight. 

Minimum  and  maximum  reductions  in  test  length.  The  data  in  Table  8  reflect 
only  the  reductions  in  average  test  lengths.  Table  9  presents  the  minimum  and 
maximum  reductions  from  the  conventional  test  length  that  were  observed  for  any 
one  student  when  the  inter-subtest  branching  strategy  was  used.  Inspection  of 
this  table  reveals  that  for  each  testing  condition  (except  for  the  corrected 
fall  equations  applied  to  the  winter  data  with  .01  termination  criterion),  to¬ 
tal  test  length  was  reduced  for  all  students  by  at  least  2.5%.  The  largest  re¬ 
duction  in  total  test  length  was  that  observed  for  the  fall  data  using  corrected 
fall  equations  and  a  termination  criterion  of  .05,  where  the  reduction  was  67.4%. 

For  each  subtest  separately  the  minimum  reduction  in  subtest  length  (for 
all  tests  but  one)  was  0%;  that  is,  there  was  at  least  one  student  who  was  ad¬ 
ministered  all  the  available  items  in  a  subtest  regardless  of  testing  condition. 
However,  there  also  were  students  whose  subtests  were  reduced  in  length  by  more 
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than  75%.  In  fact,  there  were  some  subtests  (specifically.  Ecology)  that  stu¬ 
dents  "skipped"  altogether,  as  evidenced  by  the  100%  maximum  reduction  figures 
for  most  of  the  winter  data. 

It  would  be  expected  that  as  the  tests  continued  and  more  information  was 
available  with  which  to  predict  scores  on  subsequent  subtests,  these  predicted 
scores — hence,  entry  points  into  the  subtest — would  become  more  accurate.  This 
should  be  reflected  in  more  stable  ability  estimates  and  therefore  shorter  sub¬ 
sequent  subtests.  Indeed,  there  is  a  trend  in  the  data  of  Table  9  for  increas¬ 
ingly  larger  reductions  in  test  length  for  the  tests  administered  later  in  the 
inter-subtest  branching. 

Correlations  of  Achievement  Level  Estimates 

Table  10  presents  the  values  of  the  correlation  coefficients  (r)  between 
the  Bayesian  9  values  from  the  conventional  tests  and  the  adaptive  tests,  under 
all  testing  conditions.  Generally,  these  correlations  were  fairly  homogeneous; 
more  than  half  of  them  were  greater  than  .90,  while  less  than  10%  of  them  were 
below  .80. 

Adaptive  Intra-Subtest  Item  Selection 

With  no  inter-subtest  branching,  the  largest  correlations  were  those  ob¬ 
served  for  the  Cell  subtest  with  variable  termination  .01 — for  both  sets  of 
data,  r  =  .998;  and  for  the  Ecology  subtest  under  the  same  conditions  for 
winter  data,  r  =  .995.  The  smallest  correlation  was  observed  for  the  Ecology 
subtest  with  a  termination  criterion  of  .05;  here,  the  winter  data  correlation 
was  r  =  .527.  This  appears  rather  low,  but  the  average  length  of  this  adapted 
subtest  was  only  1.07  items  (see  Table  5). 

Inter-Subtest  Branching 

Classical  equations.  When  the  classical  fall  quarter  equations  were  ap¬ 
plied  to  the  data  collected  from  that  same  quarter,  the  range  of  correlations 
was  fairly  small.  These  correlations  ranged  from  .846  (for  the  Energy  subtest) 
to  .979  (for  the  Cell  subtest)  with  the  .01  termination  criterion.  For  the 
termination  criterion  of  .05,  these  correlations  were  .795  (for  Energy)  and 
.890  (for  both  Reproduction  and  Ecology). 

When  the  winter  quarter  equations  were  applied  to  the  winter  data,  the 
correlations  varied  even  less.  For  the  .01  termination  criterion  the  range 
was  from  .921 (for  Reproduction)  to  .983  (for  Chemistry).  For  the  .05  criterion 
the  range  was  from  .876  (for  Reproduction)  to  .962  (for  Chemistry). 

In  general,  the  addition  of  an  inter-subtest  branching  strategy  to  adap¬ 
tive  intra-subtest  item  selection  reduced  the  correlations  between  convention¬ 
al  and  adaptive  subtest  scores  by  a  small  amount  (less  than  .021  for  the  fall 
data  and  less  than  .040  for  the  winter  data).  The  single  exception  to  this 
was  for  the  winter  administration  of  the  Ecology  subtest  (termination  criterion 
of  .05),  where  inter-subtest  branching  increased  the  correlation  from  .527  to 
.886.  These  reductions  in  the  correlations  can  be  accounted  for  by  the  de¬ 
creases  in  number  of  items  with  which  0  was  estimated;  the  inter-subtest  branch¬ 
ing  strategy  typically  reduced  test  length  over  that  obtained  with  intra-subtest 
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item  selection  alone.  This  effect  can  also  be  seen  by  comparing  the  results 
from  the  two  termination  criteria;  the  correlations  were  typically  lower  for 
the  .05  criterion,  which  generally  yielded  shorter  tests. 

Corrected,  equations.  The  pattern  of  correlations  observed  for  the  tests 
using  the  corrected  regression  equations  paralleled  that  observed  for  the 

classical  equations.  That  is,  the  range  of  correlations  was  fairly  small  for 

both  the  fall  and  winter  quarter  data  sets,  ranging  from  .818  to  .979  under 
the  .01  termination  criterion  for  the  fall  quarter  Energy  and  Cell  subtests, 
respectively,  and  from  .770  to  .887  under  the  .05  termination  criterion  for 
the  fall  quarter  Energy  and  Chemistry  subtests,  respectively. 

For  the  winter  quarter  equations  applied  to  the  winter  data, the  range  of 
conventional-adaptive  score  correlations  was  from  .889  (for  Reproduction)  to 
.983  (for  Chemistry)  under  the  .01  criterion  and  from  .715  (for  Ecology)  to 

.962  (for  Chemistry)  under  the  .05  criterion.  In  all  cases,  the  correlations 

obtained  using  the  classical  equations  were  at  least  as  large  as,  and  usually 
larger  than,  those  obtained  using  the  corrected  regression  equations. 

Cross-Validation 


Under  the  cross-validation  conditions  (when  fall  equations  were  applied 
to  winter  data,  and  vice  versa) ,  there  was  no  systematic  tendency  for  the  cor¬ 
relations  to  be  either  higher  or  lower  than  those  obtained  in  the  development 
groups.  For  the  sets  of  classical  and  corrected  equations  alike,  cross-vali¬ 
dation  yielded  higher  correlations  about  half  the  time  and  lower  correlations 
the  other  half.  Thus,  there  appears  to  be  no  net  decrement  or  increment  in  the 
accuracy  of  measurement  when  regression  equations  that  were  developed  on  one 
group  were  applied  in  the  inter-subtest  branching  strategy  to  data  for  a  dif¬ 
ferent  group. 


Information 

Appendix  Tables  E  through  M  present  the  subtest  information  curves  for 
each  subtest  under  the  various  testing  conditions  and  across  the  two  academic 
quarters.  It  should  be  noted  that  since  the  Chemistry  subtest  was  administered 
first  each  quarter  (Table  E) ,  the  initial  Bayesian  prior  9  and  variance  were 
0.0  and  1.0,  respectively,  for  all  students  over  all  testing  conditions.  Thus, 
because  the  first  subtests  administered  were  identical,  there  were  no  differ¬ 
ences  in  the  values  of  the  subtest  information  curves  across  testing  conditions 
within  one  termination  criterion. 

Adaptive  Intra-Subtest  Item  Selection 

To  illustrate  the  findings  with  respect  to  information  for  the  various 
testing  conditions,  Figures  la  and  lb  present  the  information  curves  for  the 
fall  quarter  Cell  and  Reproduction  subtests  (see  Tables  F  and  H)  obtained  when 
the  tests  were  administered  conventionally  and  with  adaptive  intra-subtest  item 
selection  (termination  criterion  of  .05).  The  curves  are  virtually  indistin¬ 
guishable  in  each  case.  That  is,  there  was  little,  if  any,  loss  of  information 
incurred  by  utilizing  an  adaptive  intra-subtest  item  selection  strategy,  even 
though  previous  results  indicated  that  the  adaptive  tests  were  shorter  than  the 
conventional  tests. 


Subtest  Information  Subtest  Information 
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Figure  1 

Subtest  Information  Curves  for  the  Fall  Quarter  Cell  and  Reproduction 
Subtests  Administered  Conventionally,  with  Intra-Subtest 
Item  Selection  and  Inter-Subtest  Branching 


(a)  Cell 
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For  the  Cell  subtest  (Figure  la)  there  was  a  slightly  larger  separation 
between  the  curves  above  the  point  at  which  the  curves  were  peaked. with  the 
adaptive  test  slightly  lower  than  the  conventional  test;  this  pattern  is  not 
evident  in  Figure  lb.  The  differences  observed  in  these  figures  were  even 
smaller  when  the  more  stringent  termination  criterion  (.01)  was  used  (see  Tables 
F  and  H) . 

Inter-Subtest  Branching 

Classical  equations.  Also  included  in  Figures  la  and  lb  are  the  infor¬ 
mation  curves  obtained  using  an  inter-subtest  branching  strategy  with  the  class¬ 
ical  fall  equations  and  a  termination  criterion  of  .05.  There  is,  again,  mini¬ 
mal  separation  among  the  curves,  particularly  for  the  Reproduction  subtest. 

As  before,  the  curves  begin  to  differ  for  the  Cell  subtest  in  the  upper  tail, 
with  the  inter-subtest  branching  strategy  resulting  in  higher  information  values 
than  the  other  two  strategies. 

Corrected  equations.  For  both  the  fall  and  winter  data  the  information 
curves  obtained  using  the  corrected  equations  were  nearly  always  lower  than  the 
curves  obtained  with  the  classical  equations.  While  this  difference  was  small, 
it  was  consistent  across  all  five  subtests  for  each  quarter  (see  Tables  F 
through  M) . 

Cross-Validation 


When  the  classical  regression  equations  were  used  on  the  fall  data,  sub¬ 
test  information  was  slightly,  though  systematically,  higher  under  cross-vali¬ 
dation  than  for  the  development  groups.  That  is,  applying  winter  quarter 
equations  to  fall  quarter  data  yielded  higher  levels  of  information,  on  the 
average,  than  did  applying  the  fall  quarter  equations  to  the  fall  data.  This 
effect  was  consistent  across  all  five  subtests  for  the  fall  data.  For  the 
winter  data,  the  results  were  mixed. 

When  the  corrected  regression  equations  were  used  in  cross-validation, 
the  results  were  mixed  for  both  sets  of  data.  For  about  half  of  the  subtests, 
there  was  a  small  increase  in  information,  and  for  the  rest  of  the  subtests 
there  was  a  small  decrease  in  information;  thus,  there  was  no  net  change  in 
information  on  cross-validating  with  the  corrected  equations.  In  all  cases, 
differences  between  mean  information  levels  across  the  various  testing  condi¬ 
tions  were  slight. 


DISCUSSION 

This  paper  has  endeavored  to  replicate  previously  reported  findings 
(Brown  &  Weiss,  1977)  that  a  combination  of  adaptive  intra-subtest  item  selec¬ 
tion  and  inter-subtest  branching  strategies  could  significantly  reduce  the 
length  of  an  achievement  test  battery,  with  a  corresponding  minimal  loss  in 
psychometric  test  information.  The  present  study  applied  this  adaptive  test¬ 
ing  strategy  to  the  responses  from  a  conventionally  administered  classroom 
exam  and  separated  out  the  effects  of  adaptive  intra-subtest  item  selection  and 
inter-subtest  branching  on  test  length  and  test  information.  In  addition,  this 
paper  investigated  the  effects  of  using  an  adaptive  testing  strategy  developed 
from  one  set  of  data  on  a  different  data  set  using  a  double-cross-validation 
design . 
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Adaptive  Intra-Subtest  Item  Selection 

The  adaptive  intra-subtest  item  selection  strategy  used  in  this  study  was 
identical  to  that  utilized  by  Brown  and  Weiss  (1977);  that  is,  items  were  se¬ 
lected  on  the  basis  of  the  amount  of  psychometric  information  available  at  the 
current  level  of  6.  Although  the  6  estimates  would  most  appropriately  be  ob¬ 
tained  using  a  maximum  likelihood  scoring  strategy,  this  strategy  utilized  a 
Bayesian  scoring  approach.  Maximum  likelihood  scoring  requires  the  availabil¬ 
ity  of  at  least  one  correct  and  one  incorrect  response  before  a  6  can  be  gen¬ 
erated,  and  the  Bayesian  routine  has  no  such  requirement.  With  the  possibility 
of  a  very  small  number  of  items  being  administered  in  any  one  subtest,  and  the 
necessity  of  scoring  responses  after  each  item,  a  maximum  likelihood  method 
would  be  nonoptimal  for  this  testing  strategy. 


Kingsbury  and  Weiss  (1979)  illustrated  the  extent  to  which  these  two  scor¬ 
ing  methods,  when  applied  to  the  same  set  of  data,  yield  scores  that  are  numer¬ 
ically  discrepant.  The  issue  of  the  appropriate  choice  of  scoring  strategy  per 
vades  implementations  of  ICC  test  theory  in  general  and  hence  is  not  confined 
to  this  particular  implementation  of  an  adaptive  testing  strategy.  Neverthe¬ 
less,  it  is  not  known  to  what  extent  the  results  reported  here  would  have 
changed  had  the  scoring  routine  been  different. 


As  Table  8  indicates,  most  of  the  reduction  in  test  length  was  due  to  the 
variable  termination  criterion  of  the  intra-subtest  item  selection  strategy. 
Although  test  length  decreased,  the  conventional-adaptive  test  score  correla¬ 
tions  remained  high  (often  close  to  1.00;  see  Table  10),  and  there  was  virtu¬ 
ally  no  loss  in  the  amount  of  psychometric  information  available  for  each  sub¬ 
test.  It  is  clear  from  these  data  that  subtest  length  can  be  reduced  from  16% 
to  32%,  with  minimal  loss  in  measurement  accuracy  and  precision,  simply  by  omit 
ting  those  items  which  add  little  information  to  the  measurement  process. 


In  ter- Sub  test  Bvanchi  net 


Utilization  of  prior  information  in  the  estimation  of  achievement  levels 
further  decreased  test  length  by  less  than  5%,  and  most  often  by  1%  or  less. 
Although  this  additional  effect  was  small,  it  appeared  to  be  fairly  consistent 
across  types  of  regression  equations  and  sets  of  data;  that  is,  in  nearly  all 
cases  the  addition  of  the  inter-subtest  branching  strategy  resulted  in  some  in¬ 
creased  reduction  in  test  length. 

Brown  and  Weiss  (1977)  reported  an  average  decrease  in  the  length  of  their 
test  battery  of  approximately  50%.  The  largest  decrease  in  the  present  study 
was  approximately  32%,  and  that  was  obtained  with  a  termination  criterion  (.05) 
less  stringent  than  the  one  used  in  the  former  study.  Part  of  this  discrepancy 
may  lie  in  the  number  of  items  available  in  each  subtest  and  in  the  total  test. 
In  the  earlier  study,  each  subtest  was  between  12  and  24  items  long,  and  the 
entire  battery  contained  201  items.  The  biology  tests  used  in  the  present 
study,  however,  were  much  shorter,  with  a  total  of  only  49  items  during  fall 
quarter  and  37  items  during  winter  quarter;  the  lengths  of  the  subtests  were 
correspondingly  small.  It  seems  reasonable  that  the  longer  subtests  in  the 
Brown  and  Weiss  study  contained  much  redundant  information  and  that  this  would 
naturally  lead  to  larger  reductions  in  test  length. 


It  would  be  interesting  to  compare  between  studies  the  extent  to  which 
inter-subtest  branching  reduced  test  length  over  and  above  that  obtained  by 
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intra-subtest  item  selection  alone.  Unfortunately,  Brown  and  Weiss  (1977)  did 
not  present  that  information.  More  research  is  needed  to  determine  how  repre¬ 
sentative  the  present  figure  of  5%  is  across  different  data  sets. 

When  Brown  and  Weiss  computed  the  conventional-adaptive  test  score  corre¬ 
lations,  they  found  that  most  of  them  were  above  .90,  with  only  1  of  their  12 
correlations  dropping  below  that  value.  There  was  a  greater  range  for  these 
correlation  coefficients  in  the  present  study,  although  here,  too,  most  of  them 
were  greater  than  .90.  The  lengths  of  the  subtests  varied  across  the  two  stud¬ 
ies,  so  direct  comparison  of  the  correlation  coefficients  is  difficult.  The 
correlations  obtained  in  the  previous  study  may  have  been  larger  than  in  the 
present  one,  but  the  adapted  subtests  were  typically  longer  as  well.  This  is 
very  likely  due  to  the  part-whole  correlations  which  would  necessarily  increase 
with  the  size  of  the  smaller  (adapted)  part. 

Both  of  these  studies  concluded  that  there  was  minimal  loss  in  the  amount 
of  psychometric  information  observed  in  each  subtest.  Brown  and  Weiss  util¬ 
ized  termination  criterion  of  .01  and  .001;  it  is  interesting  to  note  that  the 
same  conclusion  was  reached  in  the  present  study,  which  utilized  termination 
criteria  that  were  much  less  stringent  (.05  and  .01). 

Corrected  Regression  Equations 

The  use  of  Lawley  and  Maxwell's  (1973)  correction  for  error  in  the  inde¬ 
pendent  variables  in  multiple  regression  increased  the  value  of  the  multiple 
correlation  coefficient  and  the  regression  coefficients  (see  Tables  3  and  4). 
The  important  issue  here,  however,  was  whether  this  correction  affected  test 
length,  and  accuracy  and  precision  of  measurement.  On  the  average,  use  of  the 
corrected  equations  decreased  test  length  slightly  more  than  did  use  of  the 
classical  equations.  It  was  impossible  to  detect  any  large  difference  in  this 
data  set,  however,  because  there  was  such  a  small  additional  reduction  in  test 
length  attributable  to  any  kind  of  inter-subtest  branching. 

The  average  correlations  between  the  adaptive  and  conventional  achieve¬ 
ment  estimates  were  lower  when  the  corrected  equations  were  used  than  when  the 
classical  equations  were  used.  Although  this  is  puzzling  in  light  of  the  data 
in  Tables  3  and  4,  it  becomes  less  so  considering  the  fact  that  the  corrected 
equations  typically  resulted  in  shorter  test  lengths.  At  least  part  of  the 
discrepancies  among  the  correlation  coefficients  can  be  attributed  to  the  dis¬ 
crepancies  in  test  lengths.  It  is  not  clear,  however,  just  how  much  is  arti- 
factual  and  how  much  is  due  to  a  genuine  difference  in  the  way  the  levels  of 
achievement  were  estimated. 

Addit ionally, mean  information  values  obtained  using  the  corrected  regres¬ 
sion  equations  were  typically  lower  than  those  obtained  with  the  classical 
equations.  At  least  part  of  this  difference  may  be  attributable  to  the  short¬ 
er  test  lengths  that  accompanied  the  corrected  equations,  although,  again,  the 
extent  to  which  this  is  true  is  not  known. 

Cross-Validation 


In  this  study  the  regression  equations  for  the  inter-subtest  branching 
strategies  were  developed  from  data  from  two  different  academic  quarters. 
These  equations  were  then  applied  to  the  data  from  the  other  quarter  in  a 


-28- 


double-cross-validation  design  to  investigate  the  extent  to  which  the  equations 
and  hence  the  inter-subtest  branching  strategies,  were  sample-specific.  This 
was  done  for  both  the  classical  and  corrected  sets  of  equations. 

In  terms  of  test  length,  the  cross-validation  groups  typically  were  admin¬ 
istered  shorter  tests  than  were  each  of  the  development  groups.  This  was  true 
in  nearly  all  cases  under  the  .01  termination  criterion;  results  were  mixed  for 
the  .05  criterion. 

The  accuracy  of  measurement,  as  indexed  by  the  correlation  between  conven¬ 
tional  and  adaptive  test  scores,  was  not  systematically  affected  by  the  cross- 
validation  procedure  employed  here.  That  is,  cross-validating  yielded  higher 
correlations  about  half  the  time  and  lower  correlations  the  other  half,  regard¬ 
less  of  whether  the  classical  or  corrected  equations  were  used.  The  precision 
of  measurement  (i.e.,  subtest  information)  increased  slightly  under  cross- 
validation  over  that  observed  for  the  development  groups, at  least  for  the  win¬ 
ter  quarter  and  some  of  the  fall  sets  of  classical  equations;  results  were 
mixed  for  the  corrected  equations. 

The  increases  in  accuracy  and  precision  of  measurement  under  cross-vali¬ 
dation,  though  slight,  are  contrary  to  expectations,  since  cross-validating 
yielded  shorter  mean  test  lengths  as  well.  Therefore,  the  increase  in  measure¬ 
ment  accuracy  and  precision  cannot  be  accounted  for  by  test  length  changes. 

CONCLUSIONS 

The  real-data  simulation  reported  here  replicated  and  extended  the  find¬ 
ings  reported  by  Brown  and  Weiss  (1977).  That  is,  the  results  from  this  study 
show  that  test  length  could  be  reduced  by  20%-30 %  using  Brown  and  Weiss's  adap¬ 
tive  testing  strategy  for  achievement  testing  batteries.  Reduced  time  in  test¬ 
ing  means  more  time  available  to  be  spent  in  other  activities,  such  as  addition 
al  instruction. 

The  level  of  reduction  in  test  length  depended  directly  on  the  size  of 
the  termination  criterion  employed.  The  termination  criteria  used  here  were 
minimum  item  information  of  .05  and  .01;  Brown  and  Weiss  used  a  value  of  .01 
in  their  study.  Clearly,  the  choices  for  termination  were  arbitrary,  and  the 
results  might  have  been  different,  depending  on  the  value  chosen.  More  re¬ 
search  is  needed  to  determine  optimal  termination  criteria. 

The  design  of  this  study  permitted  the  separation  of  the  effects  due  to 
the  intra-subtest  item  selection  procedure  from  those  due  to  inter-subtest 
branching.  Results  from  this  study  show  that  most  of  the  reduction  in  test 
length  could  be  attributed  to  the  adaptive  intra-subtest  item  selection  method 
and  variable  termination  criterion.  When  this  strategy  was  coupled  with  inter¬ 
subtest  branching,  an  additional  reduction  in  test  length  of  only  up  to  5%  was 
observed.  More  research  is  needed  to  determine  the  specific  characteristics 
of  the  item  pool  which  would  contribute  to  greater  reductions  in  test  length 
when  the  inter-subtest  branching  strategies  are  used. 

Achievement  level  estimates  obtained  adaptively  correlated  quite  highly 
with  those  obtained  from  a  conventional  administration  of  the  subtests.  It  is 
only  when  the  subtests  were  very  short  (less  than  three  items)  that  low  corre¬ 
lations  were  observed. 
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As  was  observed  in  the  Brown  and  Weiss  (1977)  study,  there  was  a  minimal 
loss  in  the  amount  of  psychometric  information  available  in  the  subtests  due  to 
adaptive  testing.  This  was  evident  in  the  close  correspondence  between  the  in¬ 
formation  curves  for  the  adaptive  and  conventional  tests. 

Perhaps  the  most  important  finding  from  this  research  was  that  the  regres¬ 
sion  equations  obtained  from  one  set  of  data  could  be  used  to  adapt  the  testing 
for  a  different  group  of  students  and  that  the  observed  test  characteristics  for 
this  cross-validated  group  closely  paralleled  the  results  obtained  from  the  de¬ 
velopment  group.  This  result  directly  reflects  what  would  actually  happen  in 
a  live-testing  implementation  of  this  adaptive  testing  strategy;  that  is,  the 
regression  equations  used  for  inter-subtest  branching  would  be  obtained  from 
one  group  of  students  and  applied  in  the  testing  of  a  different  group  of  stu¬ 
dents.  This  study  has  shown  that  such  a  procedure  can  be  utilized  while  still 
maintaining  the  quality  of  test  characteristics  observed  for  the  original  group 
on  which  the  regression  equations  were  developed.  Of  course,  more  research  is 
needed  to  determine  the  generality  of  these  findings  in  other  situations. 

Although  this  study  has  replicated  and  extended  some  of  the  findings  re¬ 
ported  by  Brown  and  Weiss  (1977),  it  was  limited  by  the  fact  that  it,  too,  was 
a  real-data  simulation  study.  The  next  step  in  research  on  this  adaptive  test¬ 
ing  strategy  should  be  the  implementation  of  this  adaptive  testing  strategy  in 
a  live-testing  situation,  thus  enabling  researchers  to  evaluate  the  validity 
of  the  findings  from  these  simulation  studies.  In  addition,  more  research  is 
needed  to  determine  the  generality  of  these  findings  across  other  test  batter¬ 
ies  and  other  testing  situations. 
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APPENDIX:  SUPPLEMENTARY  TABLES 


Table  A 

Normal  Ogive  Item  Discrimination  (a),  Difficulty  (fc) , 
and  Lower  Asymptote  (c)  Parameter  Estimates  for  the 
Fall  Quarter  Final  Exam,  by  Subtest 

Subtest  and  Item _ a _ b _ a 


Chemistry 


1 

1.76 

.87 

.37 

2 

1.60 

-.68 

.27 

3 

1.39 

-1.41 

.49 

4 

1.55 

.33 

.32 

5 

.77 

-.66 

.15 

6 

1.54 

-.56 

.30 

7 

- 

— 

- 

8 

1.98 

-.78 

.28 

9 

2.36 

-.60 

.23 

10 

.92 

-.93 

.30 

11 

1.66 

-1.57 

.36 

12 

- 

- 

- 

13 

1.67 

.63 

.39 

Cell 

1 

1.48 

.63 

.43 

2 

2.53 

3.01 

.59 

3 

1.84 

1.68 

.49 

4 

1.79 

-.28 

.32 

5 

2.08 

-.87 

.34 

6 

1.82 

-.70 

.40 

7 

2.26 

-.48 

.54 

8 

1.17 

.12 

.51 

9 

1.58 

-1.02 

.41 

Energy 

1 

2.77 

.06 

.29 

2 

1.99 

-.83 

.59 

3 

2.01 

1.41 

.43 

4 

1.68 

-.19 

.59 

5 

1.74 

1.10 

.38 

6 

2.73 

.45 

.22 

7 

2.04 

.36 

.40 

8 

2.93 

-1.58 

.50 

9 

2.54 

-1.26 

.34 

Reproduction 

1 

1.18 

0.00 

.46 

2 

1.69 

-.76 

.40 

3 

1.47 

.54 

.49 

4 

.73 

-.24 

.34 

5 

1.40 

2.03 

.57 

6 

2.28 

-1.36 

.61 

7 

1.08 

-.53 

.21 

8 

2.41 

-1.05 

.25 

9 

1.79 

-.07 

.30 

10 

2.53 

-.33 

.24 

11 

1.52 

.38 

.53 

Ecology 

1 

1.58 

-1.35 

.38 

2 

1.45 

-1.19 

.47 

3 

2.36 

-1.64 

.55 

4 

1.66 

-.33 

.36 

5 

- 

- 

- 

6 

1.91 

-.14 

.41 

7 

1.42 

-.15 

.48 

.Vote.  Missing  entries 

indicate  that  the  item  was 

rejected  in  the 

first 

phase  of  item 

parameter 

estimation. 
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Table  B 

Normal  Ogive  Item  Discrimination  (a).  Difficulty  (b) , 
and  Lower  Asymptote  (a)  Parameter  Estimates  for  the 
Winter  Quarter  Final  Exam,  by  Subtest 


Subtest  and  Item 
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Table  D 

Factor  Loadings  and  Coramunality  Estimates  For  Maximum  Likelihood 
Factor  Analyses  of  Fall  and  Winter  Quarter  Final  Exams 


Criterion  Subtest  =  Energy 


Fall  Quarter 

Two  Independent  Variables:  Criterion  Subtest 
'Energy  1  I".  5941 

A*  Chemistry  =  .693  h 2 

Cell.  J  [.  624_ 

Three  Independent  Variables:  Criterion  Subtes 
Reproduct  iorfl  1552“ 

Chemistry  .698  ,2 

Cell  =  .623 

_Energy  _  _.590_ 

Four  Independent  Variables:  Criterion  Subtest 
rkcology  "1  r  523-1 

I  Chemistry  I  I .712  I 


Criterion  Subtest  =  Reproduction 


I”.  644  ~ 
=  .701 

L-707. 


A*  Cell  =  .611  h 

Energy  .581 

l_ReproductionJ  555_J 
Winter  Quarter 

Two  Independent  Variables:  Criterion  Subtest 
[Cell  1  [.6441 

A*  Chemistry  I  =  .701  h2 

[Energy  J  L-  707 J 

Three  Independent  Variables:  Criterion  Subtes 
Reproduction  ”504” 

^  Chemistry  _  .717  ,2 

Energy  .700 

_Cell  J  L63A_ 

Four  Independent  Variables:  Criterion  Subtest 
“Ecology  “  1303- 

Chemistry  .722 

A*  Energy  =  .694  h ‘ 

Cell  .628 
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Ecology 

'.21  IT 

.506 

.374 

.338 

.309 


cell 

'.415' 

.491 

.501 


Criterion  Subtest  =  Reproduction 


Ecology 

’.092“ 
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.481 

.394 

.264 


Table  E 

-  A 

Mean  Information  Values  (I)  at  Estimated  Achievement  Level  (6)  Intervals 
for  the  Chemistry  Subtest  of  the  Fall  and  Winter  Quarter  Final  Exams 
for  the  Conventional  Test  and  the  Adaptive  Test  Using  Only  Intra-Subtest 
Item  Selection  with  Two  Termination  Criteria _ 


Fall  Winter 
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